Picture for Seungone Kim

Seungone Kim

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Add code
Jun 01, 2026
Viaarxiv icon

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

Add code
May 26, 2026
Viaarxiv icon

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Add code
May 20, 2026
Viaarxiv icon

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Add code
Mar 19, 2026
Viaarxiv icon

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Add code
Aug 18, 2025
Figure 1 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 2 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 3 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 4 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Viaarxiv icon

Measuring Sycophancy of Language Models in Multi-turn Dialogues

Add code
May 28, 2025
Viaarxiv icon

Let's Predict Sentence by Sentence

Add code
May 28, 2025
Figure 1 for Let's Predict Sentence by Sentence
Figure 2 for Let's Predict Sentence by Sentence
Figure 3 for Let's Predict Sentence by Sentence
Figure 4 for Let's Predict Sentence by Sentence
Viaarxiv icon

FREESON: Retriever-Free Retrieval-Augmented Reasoning via Corpus-Traversing MCTS

Add code
May 22, 2025
Viaarxiv icon

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Add code
May 21, 2025
Figure 1 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 2 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 3 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 4 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Viaarxiv icon

Reasoning Models Better Express Their Confidence

Add code
May 20, 2025
Figure 1 for Reasoning Models Better Express Their Confidence
Figure 2 for Reasoning Models Better Express Their Confidence
Figure 3 for Reasoning Models Better Express Their Confidence
Figure 4 for Reasoning Models Better Express Their Confidence
Viaarxiv icon